Name | Version | Summary | date |
SurvivalEVAL |
0.4.4 |
The most comprehensive Python package for evaluating survival analysis models. |
2025-07-12 22:46:15 |
rag-evaluation |
0.2.1 |
A robust Python package for evaluating Retrieval-Augmented Generation (RAG) systems. |
2025-07-12 22:38:32 |
novaeval |
0.3.2 |
A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models |
2025-07-12 20:19:59 |
RadEval |
0.0.1rc0 |
All-in-one metrics for evaluating AI-generated radiology text |
2025-07-12 17:31:40 |
pypitest-radeval |
0.0.3 |
All-in-one metrics for evaluating AI-generated radiology text |
2025-07-11 14:54:29 |
langsmith |
0.4.5 |
Client library to connect to the LangSmith LLM Tracing and Evaluation Platform. |
2025-07-10 22:08:04 |
AgentDS-Bench |
1.2.2 |
Python client for AgentDS-Bench: A streamlined benchmarking platform for evaluating AI agent capabilities in data science tasks |
2025-07-09 21:21:17 |
agenta |
0.49.3 |
The SDK for agenta is an open-source LLMOps platform. |
2025-07-09 13:29:26 |
open-rag-eval |
0.2.0 |
A Python package for RAG Evaluation |
2025-07-08 17:20:26 |
benchwise |
0.1.0a1 |
The GitHub of LLM Evaluation - Python SDK |
2025-07-08 10:16:01 |
guidellm |
0.2.1 |
Guidance platform for deploying and managing large language models. |
2025-04-29 17:49:39 |
evo |
1.31.1 |
Python package for the evaluation of odometry and SLAM |
2025-03-20 15:37:42 |
ragmetrics-client |
0.1.9 |
Monitor your LLM calls. Test your LLM app. |
2025-03-14 23:05:52 |
math-verify |
0.7.0 |
HuggingFace library for verifying mathematical answers |
2025-02-27 16:21:04 |
trajectopy |
2.4.2 |
Trajectory Evaluation in Python |
2025-02-26 08:34:59 |
quotientai |
0.1.9 |
CLI for evaluating large language models with Quotient |
2025-02-25 18:40:21 |
python-lilypad |
0.0.23 |
An open-source prompt engineering framework. |
2025-02-25 03:25:39 |
providentia |
2.4.0 |
Providentia is designed to allow on-the-fly, offline and interactive analysis of experiment outputs, with respect to processed observational data. |
2025-02-12 13:36:50 |
maihem |
1.7.3 |
LLM evaluations and synthetic data generation with the MAIHEM models |
2025-02-11 16:54:39 |
trust_eval |
0.1.5 |
Metric to measure RAG responses with inline citations |
2025-02-11 04:42:29 |